274 research outputs found
A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion
A finite-state method, based on leftmost longest-match replacement, is
presented for segmenting words into graphemes, and for converting graphemes
into phonemes. A small set of hand-crafted conversion rules for Dutch achieves
a phoneme accuracy of over 93%. The accuracy of the system is further improved
by using transformation-based learning. The phoneme accuracy of the best system
(using a large set of rule templates and a `lazy' variant of Brill's algoritm),
trained on only 40K words, reaches 99% accuracy.Comment: 8 page
Constraint-Based Categorial Grammar
We propose a generalization of Categorial Grammar in which lexical categories
are defined by means of recursive constraints. In particular, the introduction
of relational constraints allows one to capture the effects of (recursive)
lexical rules in a computationally attractive manner. We illustrate the
linguistic merits of the new approach by showing how it accounts for the syntax
of Dutch cross-serial dependencies and the position and scope of adjuncts in
such constructions. Delayed evaluation is used to process grammars containing
recursive constraints.Comment: 8 pages, LaTe
Probing for Dutch Relative Pronoun Choice
We propose a linguistically motivated version of the relative pronoun probing task for Dutch (where a model has to predict whether a masked token is either die or dat), collect realistic data for it using a parsed corpus, and probe the performance of four context-sensitive bert-based neural language models. Whereas the original task, which simply masked all occurrences of the words die and dat, was relatively easy, the linguistically motivated task turns out to be much harder. Models differ considerably in their performance, but a monolingual model trained on a heterogeneous corpus appears to be most robust
Using a Treebank for Finding Opposites
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 139-150.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
Mining Discourse Treebanks with XQuery
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 245-256.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
- …